AITopics | finetune task

Collaborating Authors

finetune task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

95e62984b87e90645a5cf77037395959-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 10:14:30 GMT

finetune task, influence function, reviewer, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.56)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Xu, Ao, Zhao, Han, Cui, Weihao, Chen, Quan, Chen, Yukang, Zhang, Shulai, Chen, Shuang, Jiang, Jiemin, Yu, Zhibin, Guo, Minyi

arXiv.org Artificial IntelligenceNov-20-2025

Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems disaggregate the prefill and decode phases of inference. However, decode instances often experience low GPU utilization due to their memory-bound nature and insufficient batching in dynamic workloads, leaving compute resources underutilized. We introduce Harli, a serving system that improves GPU utilization by co-locating parameter-efficient finetuning (PEFT) tasks with LLM decode instances. PEFT tasks are compute-bound and memory-efficient, making them ideal candidates for safe co-location. Specifically, Harli addresses key challenges--limited memory and unpredictable interference--using three components: a unified memory allocator for runtime memory reuse, a two-stage latency predictor for decode latency modeling, and a QoS-guaranteed throughput-maximizing scheduler for throughput maximization. Experimental results show that Harli improves the finetune throughput by 46.2% on average (up to 92.0%) over state-of-the-art serving systems, while maintaining strict QoS guarantees for inference decode.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.11729

Country: